24 research outputs found
Recommended from our members
Applications of Sampling and Estimation on Networks
Networks or graphs are fundamental abstractions that allow us to study many important real systems, such as the Web, social networks and scientific collaboration. It is impossible to completely understand these systems and answer fundamental questions related to them without considering the way their components are connected, i.e., their topology. However, topology is not the only relevant aspect of networks. Nodes often have information associated with them, which can be regarded as node attributes or labels. An important problem is then how to characterize a network w.r.t. topology and node label distributions. Another important problem is how to design efficient algorithms to accomplish tasks on networks. Since nodes often have attributes, an interesting avenue for investigation consists in learning and exploiting existing correlations between node and neighbor attributes for accomplishing a task more efficiently. One of the challenges faced when studying networks in the wild is the fact that in general their topology and information associated with its nodes cannot be directly obtained. Thus, one must resort to collecting the data, but when obtaining the entire network is infeasible, sampling and estimation are the best option. This dissertation investigates the use of sampling and estimation to characterize networks and to accomplish a particular task. More precisely, we study (i) the problem of characterizing directed and undirected networks through random walk-based sampling, (ii) the problem of estimating the set-size distribution from an information-theoretic standpoint, which has application to characterizing the in-degree distribution in large graphs, and (iii) the problem of searching networks to find nodes that exhibit a specific trait while subject to a sampling budget by learning a model from node attributes and structural properties, which has application to recruiting in social networks
Evaluating the state-of-the-art in mapping research spaces: a Brazilian case study
Scientific knowledge cannot be seen as a set of isolated fields, but as a
highly connected network. Understanding how research areas are connected is of
paramount importance for adequately allocating funding and human resources
(e.g., assembling teams to tackle multidisciplinary problems). The relationship
between disciplines can be drawn from data on the trajectory of individual
scientists, as researchers often make contributions in a small set of
interrelated areas. Two recent works propose methods for creating research maps
from scientists' publication records: by using a frequentist approach to create
a transition probability matrix; and by learning embeddings (vector
representations). Surprisingly, these models were evaluated on different
datasets and have never been compared in the literature. In this work, we
compare both models in a systematic way, using a large dataset of publication
records from Brazilian researchers. We evaluate these models' ability to
predict whether a given entity (scientist, institution or region) will enter a
new field w.r.t. the area under the ROC curve. Moreover, we analyze how
sensitive each method is to the number of publications and the number of fields
associated to one entity. Last, we conduct a case study to showcase how these
models can be used to characterize science dynamics in the context of Brazil.Comment: 28 pages, 11 figure
Helping Fact-Checkers Identify Fake News Stories Shared through Images on WhatsApp
WhatsApp has introduced a novel avenue for smartphone users to engage with
and disseminate news stories. The convenience of forming interest-based groups
and seamlessly sharing content has rendered WhatsApp susceptible to the
exploitation of misinformation campaigns. While the process of fact-checking
remains a potent tool in identifying fabricated news, its efficacy falters in
the face of the unprecedented deluge of information generated on the Internet
today. In this work, we explore automatic ranking-based strategies to propose a
"fakeness score" model as a means to help fact-checking agencies identify fake
news stories shared through images on WhatsApp. Based on the results, we design
a tool and integrate it into a real system that has been used extensively for
monitoring content during the 2018 Brazilian general election. Our experimental
evaluation shows that this tool can reduce by up to 40% the amount of effort
required to identify 80% of the fake news in the data when compared to current
mechanisms practiced by the fact-checking agencies for the selection of news
stories to be checked.Comment: This is a preprint version of an accepted manuscript on the Brazilian
Symposium on Multimedia and the Web (WebMedia). Please, consider to cite it
instead of this on
Towards Understanding Political Interactions on Instagram
Online Social Networks (OSNs) allow personalities and companies to
communicate directly with the public, bypassing filters of traditional medias.
As people rely on OSNs to stay up-to-date, the political debate has moved
online too. We witness the sudden explosion of harsh political debates and the
dissemination of rumours in OSNs. Identifying such behaviour requires a deep
understanding on how people interact via OSNs during political debates. We
present a preliminary study of interactions in a popular OSN, namely Instagram.
We take Italy as a case study in the period before the 2019 European Elections.
We observe the activity of top Italian Instagram profiles in different
categories: politics, music, sport and show. We record their posts for more
than two months, tracking "likes" and comments from users. Results suggest that
profiles of politicians attract markedly different interactions than other
categories. People tend to comment more, with longer comments, debating for
longer time, with a large number of replies, most of which are not explicitly
solicited. Moreover, comments tend to come from a small group of very active
users. Finally, we witness substantial differences when comparing profiles of
different parties.Comment: 5 pages, 8 figure